Multi-Modal Emotion Recognition Fusing Video and Audio

نویسندگان

  • Chao Xu
  • Pufeng Du
  • Zhiyong Feng
  • Zhaopeng Meng
  • Tianyi Cao
  • Caichao Dong
چکیده

Emotion plays an important role in human communications. We construct a framework for multi-modal fusion emotion recognition. Facial expression features and speech features are respectively extracted from image sequences and speech signals. In order to locate and track facial feature points, we construct an Active Appearance Model for facial images with all kinds of expressions. Facial Animation Parameters are calculated from motions of facial feature points as expression features. We extract short-term mean energy, fundamental frequency and formant frequencies from each frame as speech features. An emotion classifier is designed to fuse facial expression and speech based on Hidden Markov Models and Multi-layer Perceptron. Experiments indicate that multi-modal fusion emotion recognition algorithm which is presented in this paper has relatively high recognition accuracy. The proposed approach has better performance and robustness than methods using only video or audio separately.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Efficient Multi-Modal Emotion Recognition

The paper presents a multi‐modal emotion recognition system exploiting audio and video (i.e., facial expression) information. The system first processes both sources of information individually to produce corresponding matching scores and then combines the computed matching scores to obtain a classification decision. For the video part of the system, a novel ...

متن کامل

Multi-modal audio-visual event recognition for football analysis

The recognition of events within multi-modal data is a challenging problem. In this paper we focus on the recognition of events by using both audio and video data. We investigate the use of data fusion techniques in order to recognise these sequences within the framework of Hidden Markov Models (HMM) used to model audio and video data sequences. Specifically we look at the recognition of play a...

متن کامل

Visual and Audio Aware Bi-Modal Video Emotion Recognition

With rapid increase in the size of videos online, analysis and prediction of affective impact that video content will have on viewers has attracted much attention in the community. To solve this challenge several different kinds of information about video clips are exploited. Traditional methods normally focused on single modality, either audio or visual. Later on some researchers tried to esta...

متن کامل

Spatiotemporal Networks for Video Emotion Recognition

Our article presents an audio-visual based multi-modal emotion classification system. Considering the fact of deep learning approaches to facial analysis have recently demonstrated high performance, in our work, we use convolutional neural networks (CNNs) for emotion recognition in video, relying on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial ...

متن کامل

An Audio-Visual Approach to Music Genre Classification through Affective Color Features

This paper presents a study on classifying music by affective visual information extracted from music videos. The proposed audio-visual approach analyzes genre specific utilization of color. A comprehensive set of color specific image processing features used for affect and emotion recognition derived from psychological experiments or art-theory is evaluated in the visual and multi-modal domain...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013